-
Notifications
You must be signed in to change notification settings - Fork 753
Module graceful shutdown support #4031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Module graceful shutdown support #4031
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@rameshraghupathy Could you please update PR description more in detail of the code changes being done? |
|
@rameshraghupathy The table being updated from chassisd is CHASSIS_MODULE_TABLE (and not CHASSIS_MODULE_INFO_TABLE) is that implementation also being changed? |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
… format, swsscommon for DB connection etc
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Provide support for SmartSwitch DPU module graceful shutdown.
Description:
Single source of truth for transitions
All components now use
sonic_platform_base.module_base.ModuleBasehelpers:set_module_state_transition(db, name, transition_type)clear_module_state_transition(db, name)get_module_state_transition(db, name) -> dictis_module_state_transition_timed_out(db, name, timeout_secs) -> boolEliminates duplicated logic and race-prone direct Redis writes.
Correct table everywhere
CHASSIS_MODULE_TABLE(replacesCHASSIS_MODULE_INFO_TABLE).Ownership & lifecycle
The initiator of an operation (
startup/shutdown/reboot) sets:state_transition_in_progress=Truetransition_type=<op>transition_start_time=<utc-iso8601>The platform (
set_admin_state()) is responsible for clearing:state_transition_in_progress=Falsetransition_end_time=<epoch>(or similar end stamp).CLI pre-clears only when a prior transition is timed out.
Timeouts & policy
Platform JSON path only:
/usr/share/sonic/device/{plat}/platform.json; else constants.Typical production values used:
startup: 180s,shutdown: 180s(≈graceful_wait 60s + power 120s),reboot: 120s.Graceful wait (e.g., waiting for “Graceful shutdown complete”) is a platform policy and implemented inside platform
set_admin_state()—not in ModuleBase.Boot behavior
chassisdon start:set_initial_dpu_admin_state()which marks transitions via ModuleBase before calling platformset_admin_state().gNOI shutdown daemon
Listens on
CHASSIS_MODULE_TABLEand triggers only when:state_transition_in_progress=Trueandtransition_type=shutdown.Never clears the flag (ownership stays with the platform).
Bounded RPC timeouts and robust Redis access (swsssdk/swsscommon).
CLI (
config chassis modules …)is_module_state_transition_timed_out()→ auto-clear then proceed.startup/shutdown; platform clears on completion.Redis robustness
hset(mapping=...)usage.Race reduction & consistency
transition_start_time; clears may add an end stamp.Change scope
HLD: # 1991 sonic-net/SONiC#1991
sonic-platform-common: #567 sonic-net/sonic-platform-common#567
sonic-host-services: sonic-net/sonic-host-services#255
sonic-platform-daemons: sonic-net/sonic-platform-daemons#667
How to verify it
Issue the "config chassis modules shutdown DPUx" command
Verify the DPU module is gracefully shut by checking the logs in /var/log/syslog on both NPU and DPU